課程資訊
課程名稱
資料科學與社會研究
Data Science and Social Inquiry 
開課學期
112-1 
授課對象
社會科學院  經濟學研究所  
授課教師
陳由常 
課號
ECON5166 
課程識別碼
323 U1250 
班次
 
學分
3.0 
全/半年
半年 
必/選修
選修 
上課時間
星期三6,7,8(13:20~16:20) 
上課地點
社科506 
備註
「資料科學與社會分析學士班跨域專長」必修課。
限學士班三年級以上 或 限碩士班以上 或 限博士班
總人數上限:60人 
 
課程簡介影片
 
核心能力關聯
核心能力與課程規劃關聯圖
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

Please check

https://docs.google.com/document/d/1Va_CnqUgMtGCAO6hRUENvu4F7j2KTsWAXSBxKP0OZ0M/edit?usp=sharing

for detail information. Below is a problem set that helps you decide whether you are ready for this course (draft version, don't write yet)

https://drive.google.com/file/d/1fWoYhHQmbVyyyupOJQp73sZDZRD_PZsK/view?usp=sharing

---

Econ 5166 serves as an introduction to “classical” machine learning (ML) methods such as PCA, LASSO, decision trees, random forests, and more, with a strong focus on their practical applications in social science research and business. This course is designed for students who have already completed an initial course in statistics, have some hands-on data manipulation experience, and are keen to delve into the underlying principles of machine learning.

Despite the myriad of excellent ML courses available at NTU, Econ 5166 stands apart due to two distinctive aspects. Firstly, the course emphasizes the underlying relationship between ML and statistics. It deciphers how ML, like any data-based exploratory technique, fits into the broader statistical framework. The connections between fundamental statistical concepts and classical ML methods—correlation and PCA, OLS regression and LASSO, hypothesis testing and classification, to name a few—will be illustrated. This class also serves as an opportunity to revisit statistics by exploring its core concepts (like correlation, expectations) in light of real-world applications. It's worth noting that given our focus on understanding statistical underpinnings rather than merely the methodology, we will primarily concentrate on the more traditional, accessible ML methods. Modern methods like deep learning will be conceptually addressed as an extension of what we will actually learn in class.

The second distinctive feature of this course involves a in-depth exploration of ML applications in social science research, and to a lesser extent, business. Our primary goal is to equip you with practical ML skills to tackle real-world challenges effectively. To achieve this, each method we discuss will be motivated by business applications, followed by an analysis of a research paper or one of my own research projects to demonstrate the relevance of ML. Additionally, an integral part of this course is a project assignment where you'll refine your skills in problem formulation, coding for data analysis, precise interpretation of statistical results, and effective communication of your findings. This hands-on approach not only solidifies your theoretical understanding but also enhances your ability to use ML methods in practical, real-world scenarios. 

課程目標
1. Developing working knowledge about machine learning methods: Students will learn how to intuitively understand the principles of various machine learning methods through mathematical definitions and algorithms. Furthermore, they will be able to apply this knowledge in actual data analysis work, such as feature selection and interpreting analysis results.

2. Understanding machine learning algorithms through statistics: Students will utilize basic concepts such as conditional expectation to grasp the statistical implications of these algorithms (for example, cross-validation). Simultaneously, this course also emphasizes how to lead students to re-understand basic statistical concepts like correlation, regression analysis, hypothesis testing, etc., from a practical application perspective.

3. Cultivating data processing skills: Students will learn a series of data processing skills, including data cleaning, ETL (extract, transform, load), web crawling, data visualization, to application development of data products, and the verification of data reliability and the inspection of potential errors in the analysis process.

4. Fostering basic literacy in data science: Students will cultivate essential abilities for a data scientist, such as enhancing mathematical maturity and mastery of statistics, and refine their scientific problem-solving method in the final project. At the same time, they will learn how to apply data (science) in a business environment and have a preliminary understanding of the division of labor and required skills for various job functions.


 
課程要求
1. Homework
2. Midterm
3. Final Project Presentation 
預期每週課後學習時數
Office Hours
 
指定閱讀
待補 
參考書目
Murphy (2022), Probabilistic Machine Learning: An Introduction 
評量方式
(僅供參考)
   
針對學生困難提供學生調整方式
 
上課形式
以錄影輔助
作業繳交方式
考試形式
其他
課程進度
週次
日期
單元主題
第1週
9/06  Introduction 
第2週
9/13  Principal Component Analysis 
第3週
9/20  Principal Component Analysis 
第4週
9/27  Factor analysis 
第5週
10/04  Clustering 
第6週
10/11  Clustering 
第7週
10/18  Project Discussion (No Class) 
第8週
10/25  Penalized Regression 
第9週
11/01  Penalized Regression 
第10週
11/08  Penalized Regression 
第11週
11/15  Midterm 
第12週
11/22  Project Discussion (No Class) 
第13週
11/29  Tree Algorithms 
第14週
12/06  Tree Algorithms 
第15週
12/13  Tree Algorithms 
第16週
12/20  Final Project Rehearsal (No Class/ Graded) 
第17週
12/27  Project Presentation